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Abstract 

We introduced the notation of a set of prohibitions and give definitions 
of a complete set and a crucial word with respect to a given set of prohi- 
bitions. We consider 3 particular sets which appear in different areas of 
mathematics and for each of them examine the length of a crucial word. 
One of these sets is proved to be incomplete. The problem of determining 
lengths of words that are free from a set of prohibitions is shown to be 
NP-complete, although the related problem of whether or not a given set 
of prohibitions is complete is known to be effectively solvable. 

1 Introduction and Background 

In defining or characterising sets of objects in discrete mathematics, "languages 
of prohibitions" are often used to define a class of objects by listing those pro- 
hibited subobjects that are not contained in the objects of the class. To this end 
the notion of a subobject is defined in different ways. The notion depends on 
the set under consideration. These sets are subwords for partially bounded lan- 
guages, subgraphs for families of graphs and so on. One of the classes of interest 
that have appeared and are considered in different areas of mathematics is the 
class of nonrecurrent symbolic sequences defined by prohibiting strong period- 
icity in them, or, to be more exact, by prohibiting the repetition of subwords in 
these symbolic sequences, for example of type XX. 

In this paper we consider 3 types of "prohibitions" connected with a gener- 
alisation of the notion of nonrecurrent symbolic sequences, and for each of these 
sets we consider the structure of crucial words and find their lengths. In Section 
5 we investigate the problem of determining lengths of words that are free from 
any given set of prohibitions. We show that this problem is NP-complete al- 
though the related problem whether or not a given set of prohibitions is complete 
is known to be effectively solvable. 

Let A = {ai, . . . , a n } be an alphabet of n letters. A word in the alphabet 
A is a finite sequence of letters of the alphabet. Any i consecutive letters of a 
word X generate a subword of length i. If A" is a subword of a word Y, we write 

icy. 



The set A* is the set of all the words in the alphabet A. Let S C A*. Then 
S is called a set of prohibited words or a set of prohibitions. A word that does 
not contain any words from S as its subwords is said to be free from S. The set 
of all words that are free from S is denoted by S. 

Example 1. Let A = {a, b}. The set of prohibitions is S = {aa, ba}. The 
word abbb is in S. 

If there exists a k G N such that the length of any word in S is less than k, 
then S is called a complete set. 

Example 2. A = {1, 2, 3, 4}. The set of prohibitions is 
S = {123, 13, 14, 11, 22, 33, 44 }. 
Then S is incomplete, since the word 124124 . . . 124 is in S for any k. 

3k 

Example 3. A = {1, 2, 3}. The set of prohibitions is 

S = {12, 23, 31, 32, 11, 22, 33}. 
It is easy to check that S is complete. 

A word X G S is called a crucial word (with respect to S), if the word Xcn 
contains a prohibited subword for any letter £ A. This means that Xa% 
has the structure BBiCn, where B is some word and Bi<n G S. The subword 
Bi is called the i-ending of crucial word X. If for each letter of the alphabet 
we consider minimal i-ending (with respect to inclusion) we obtain a system of 
included z-endings, which we will use to investigate crucial words. 

Example 4. A = {a, b, c}. The set of prohibitions is S = {aa, cab, acac}. 
The word abaca is crucial with respect to S. 

A crucial word of minimal (maximal) length, if it exists, is called a minimal 
(maximal) crucial word. 

Example 5. A = {a, b, c}. The set of prohibitions is S = {aa, cab, acac}. 
The word aca is a minimal crucial word with respect to S. There do not exist 
any maximal crucial words, since the word b . . . baca is crucial for all k G N. 

k 

Let L min (S) [L max (S)) denote of the length of a minimal (maximal) crucial 
word with respect to S. 

In this paper we consider three sets of prohibitions denoted S™, SJ?, S3' . 
Here we use n for indicating the number of letters of the alphabet under con- 
sideration and k is a natural number. 

We now give the definitions of these sets: 

S™ = {XX I X G A*}, that is, we prohibit the repetition of two equal 
consecutive subwords. 
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S™ = {XY | V(X) = V(Y)} 7 where F(X) = (vi(X), . . . , v n {X)) is the content 
vector of X, in which Vi{X) is the number of occurrences of the letter a, in X. 
That is, we prohibit the repetition of two consecutive subwords of the same 
content. 

S%' k = {XY | d{X, Y) < k, \X\ = \Y\ > k + 1, k e N}, where d(X, Y) is the 
number of letters in which the words X and Y differ (Hamming metric) and \X\ 
is the length of the word X. That is we prohibit any two consecutive subwords 
of the length greater then k such that the number of positions in which these 
words differ is less then or equal to k. 

The proofs of the theorems in this paper consist of the constructions of 
extremal crucial words and of the proofs of their optimality, i. e. the lower 
bound for L mi „(S) and the upper bound for L max (S). 

2 The Set of Prohibitions S? 

Theorem 1. We have 

£mm(S") = 2™ — 1. 

Proof. We define a crucial word X by induction: 

X\ = ai, Xi = Xj_idj.Xj_i, X = X n . 

From this construction it follows that \X\ = 2" — 1. We will prove that X 
is a minimal crucial word with respect to S™. 

Let U be an arbitrary minimal crucial word. We show that U coincides with 
the word X up to a permutation of letters in A. 

From the definition of a crucial word it follows that in the word XJ a - L there is a 
prohibited word of the form BidiBiCH, where Bi is a certain word and BiCuBica 
is the ending of the word Uai (the ending may coincide with Udi). In this case 
the i-cnding is the subword BiaiBi. Let £j = BiaiBi. 

We assume that £\ C £2 C . . . C £ n , since we can make such ordering 
by permuting the letters of the alphabet, which obviously does not affect the 
cruciality and minimality of a word. 

Note that the minimal crucial word U has the form 

U = B n a n B n = B n a n Y n ai, 

where Y n is a certain word. Actually, if on the right of B n a n B n there is a certain 
word, then it contradicts the minimality of a crucial word, and if instead of a\ 
there stands au (k > 1) then it contradicts £\ C £k- 

We show that £ n -i coincides with B n . We have £ n -\ = B n -\a n -\B n -\ and 
let a n B n be a subword of £ n -i- Now £ n -i has the form Ka n Pa n -\Ka n P ', where 
Ka n P = B n _i), but then 

£ n = Pa n _ 1 Ka n Pa n _ 1 Ka n P, where Pa n _ 1 Ka n P = B n , 
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and the word U contains the prohibited subword a n Pa n P. This can not be the 
case. It means that l n -\ is a subword of the word B n , and the word U has the 
form: 

U — t n — Z n £ n —i(i n Z n £ n —i 7 

where Z n is a certain word. Since we explore a minimal crucial word, we have 
Z n = 0, and then B n = l n —\. In the same way we can show that Bi = ti-\ for 
each i — 2, . . . , n — 1 and B\ = 0. 

Hence the structure of a minimal crucial word U coincides with that of the 
word X as required. □ 

Remark 2. From the proof of Theorem |l| it follows that the word X is the 
unique minimal crucial word to within a transposition of the letters of the 
alphabet A. 

3 The Set of Prohibitions S£ 

Proposition 3. A minimal crucial (with respect to ) word can not have three 
letters, each of which appears twice in the word. 

Proof. Since the proposition is obviously true for |A| = 1,2,3, we will consider 
the case |A| > 4. 

Let X be a minimal crucial word, and suppose the system of included i- 
endings for it is l\ C l<i C . . . C £ n = X. Suppose the letters a^, a£ 2 , dj 3 occur 
twice in X and that i\ < i% < 23 < n (the fact that 21,^2, *3 do not equal n 
follows from the fact that a n must occur an odd number of times). 

When we pass from £i 3 ~\ to £i 3 (£i 3 -i is determined, since there are ii, 
*2 < «3) there must appear a letter aj 3 , and when we pass from £i 3 to £i 3 +i (£i 3 +i 
is determined, since 13 < n) there must appear one more letter a^; Hence, since 
there are two letters in X, there are no letters for 2 < j < 13 in £j whence 
there are no letters a, 3 in the X to the left of £i 2 (both letters aj 3 lie to the left 
respecting of £i 2 ). 

Obviously, the letter must be in £i 1 . The second letter appears when 
we pass from £i t to £i 2 . Since there are only two letters , there are no letters 
djj in the word X to the left of £i 2 . 

If we write the letter <Xi 3 +i to the right of the word X we obtain a prohibited 
word (a word from SJ). Words from S2 are divided into two parts which have 
the same content. Obviously, the letters a^ 3 must be in different parts of the 
prohibited word, and letters must be in different parts of the same word 
which is impossible, since the letters ai 3 lie strictly to the left of , and this 
contradicts the assumption. □ 

Remark 4. From the proof of proposition 1 we have that if letters at and aj 
occur twice in a word X (in which fiC<2C..Cf„ = X), then either i = j + 1 
or j = i + 1 . 
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Theorem 5. For any n > 2 we have 

L min (S%)=4n-7. 

Proof. Note that a natural approach to the construction of a crucial word is 
possible. It consists of an algorithm of step-by-step optimisation: We ascribe to 
a crucial word of an n-letter alphabet a minimum number of letters to obtain a 
crucial word of an (n + l)-lettcr alphabet. 

The algorithm can be written recursively in the following way: 

X n = B n _ia n B n _i 

B n -i = B n _ 3 a n _iB n _ 3 

B 1 = a lt B 2 = a 2 , B_! = B = X Q = 0. 

Some initial values when implementing the algorithm are: 

Xi = oi, 

X2 = 010201, 

X 3 = a 2 a 3 aia 2 ai, 

X4 = 010301040203010201, 

X 5 = 02040205010301040203010201. 

This is an algorithm by which the minimal crucial word X n for the set of 
prohibitions S" can be built. For S2 such a construction gives an upper bound 
of the form exp(n/2), or, to be more exact, 

(3 - (n mod 2))2^ i - 3. 

We now give an upper bound that is a linear function. 

We introduce, as before, a system of included i-endings: l\ C £2 C . . . C 
£ n (we permute the letters of the alphabet if it is necessary). We show that 
the passage from ^_i to li is possible by adding only two symbols (letters of 
alphabet A). 

When we passed from £j_i to li let there appear symbols y and z. £i-\ 
may be denoted by AB, where A is a certain word, B consists of the letters 
of the word A (which are somehow mixed) and B contains one letter Oj_i less 
than A does. Let x be the last letter of the word A on the right. Then li may 
be denoted by yzKxB, where A = Kx. From the definition of £j we have the 
equation 

yUzl)K = Bl)xl)a l . 
which from the definition of K and B is equivalent to 

2x U ai — y U z U ai. 

It follows necessarily that x = a.;_i and either y = Oj_i, z = Oj or y — Oj, z = 
cii-\. Suppose y — Oj_i, z — a^. 
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For example, we have the following crucial word for a 6-letter alphabet: 

0405030402030102(060403020102030406, 

(the vertical line was drawn for a more convenient visual perception of the word). 
This word is crucial and its length is equal to 17. 
We consider a case of an arbitrary n > 3 defining the word W as 

W — a„_20 rl -ia„_30„_2 • • • Oi02|anO n _20 Il -3 . . . 020102 . . . a„_30„_20 rl . 

Then | W\ = 2(n - 2) + n - 1 + n - 2 = 4n - 7. 
Let us verify that the word W is crucial. 

If we write the letters ai, 02, a n to the right of the word W we will obviously 
have prohibited subwords. Let 2 < i < n. Then if we write the letters a, we 
will have the prohibition 

Oj-iOj . . . 0i020 n a„_2 • ■ • Oj|a,_i . . . 020102 . . . a Tl _2a Tl Oi, 

since the composition vectors of the left and right subwords with respect to 
the vertical line are equal. 

Before proving that W G S2 we make the following remark. 

In the word W we have £ n C li C . . . C £n—2 C £n— l* Substituting 01 for 
a n , 02 for ai, ... , a n for a n _i we obtain another word 

U = a„_ia„ . . . a2a 3 |aia„_i . . . 030203 . . . a„_iai, 

for which £1 C £ 2 C . . . C £ n . 

In both cases (before and after substitution of letters of the alphabet) we 

have the construction of a crucial word (which will be proved below) hence the 

same upper bound of the length of a minimal crucial word. 
For W it is more convenient to show further that W € S?? ■ 
We rewrite W making in it the marks (1),(2), . . . ,(2n-4), which number the 

gaps between letters of a word like this: 

(2n - 4)a„_ 2 (2n - 5)a„_i . . . (2)ai(l)a 2 |a„a n _2 . . . o 2 aia 2 • • • o rl _ 2 a„. 

In a possible prohibition we mark the left and right bounds. Note that the 
length of a prohibition is an even number, and each letter must occur an even 
number of times in a prohibition. The left bound of the prohibition must lie to 
the right of the mark (2n-5), since the letter a n -i enters W once; 

It must lie to the left of the mark (1), since to the right of the mark (1) there 
is one letter a\. 

Note that if m is even then (to) is not the left bound of the possible prohi- 
bition. Actually in this case two variants are possible: 

1) the prohibition does not cover the left letter a n . 

2) the prohibition covers the left letter a n . 

In the second case we have not a prohibition, since if the prohibition begins 
from the even mark, then it can not cover the second a n . 
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In the first case the right bound of the prohibition lies to the left of a n , hence 
the letter a™+i enters the prohibition only once. 

Suppose the prohibition begins from the mark (m) and m is odd. 
There are two possible cases. 

1) The prohibition does not cover the left letter a n (this case is impossible 
since the letter a^j occurs the prohibition once). 

2) The prohibition covers the left a n . Then it covers the right a n too, and 
the letter a^mj occurs an odd number of times in the prohibition. So W G 
and hence L m i n (S2) < 4n — 7 for n > 2. 

We give now a lower bound. 

Since the length of a minimal crucial word must be odd, and the passage 
from £i to £i + i requires at least two letters, we have that a trivial lower bound 
of the length of a minimal crucial word is 2n — 1 . 

Let us now improve the lower bound. Obviously a minimal crucial word in 
which l\ C £2 C . . . C i n has an even number of occurrences of the letter dj for 
i = 1, . . . , n — 1 and an odd number of occurrences of the letter a n . The word 
U has two letters a±, two letters 02, one letter a n and four of any other letter. 
From proposition ^ we know that there does not exist a crucial word that has 
the fewer number of letters, hence the word U gives us the lower bound of the 
length of a minimal crucial word. □ 



An upper bound is given by the construction P1P2 ■ ■ -Pkxp\P2 ■ ■ - Pk, where 



4 The Set of Prohibitions Sj' 



Theorem 6. We have 




L m in(S^' ) > 2k + 1. 



x,pi £ A, i = 1, . . . , A; and x ^ pi. 



□ 



Remark 7. The crucial word with respect to S3' is unique and its length is 



2k + 1. 



Theorem 8. We have 




Proof. Let 
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Moreover, let us consider an arbitrary crucial word A, with respect to S3' , 
of length greater then 3fc + 3. It is easy to see that if a\a 2 ■ ■ ■ a k+ i are the first 
fc+1 letters of A then the next fc + 1 letters of A must be aia 2 ■ ■ ■ flfc+i, because 
otherwise the first 2fc + 2 letters of A will form a prohibited subword. By the 
same argument, we can show that 

A = a x a 2 . . . a k+ iaia 2 ■ ■ ■ a k +\a\a 2 ■ ■ ■ a k +iai .... 

Let us consider the subwords Ai of A of the length 2fc + 4 which start from 
the ith letter, where 1 < i < fc: 

Ai = . . . ak+\a\ . . .Hi ai+i . . . a k +iai . . . ai + i 

v V 

fe+2 fe+2 

If ai = di + i then the underbraced subwords of Ai are the same in the first and 
in the last positions, so they differ in at most fc positions, hence Ai is prohibited. 
So we must have = a i+1 for i = 1, . . . , fc. 

Without loss of generality we can assume that a\ — 1, so 

A = Y\_^ , 22_^ll^_l 2 .... 

fc+i fc+i fc+i 

It is easy to see that if the length of A is greater then 3fc + 3 then A has a 
prohibited subword of length 2fc + 4: 

A = 1J_^1^£^11~^T2 2 .... 

fc fc+i fc+i 

(here and then two braces above an word show us a disposition of a prohibited 
subword and, in particular, a disposition of parts of this subword that correspond 
to X and Y from the definition of the set of prohibitions Sg^). 

So L max (Sl' k ) < 3fc + 3. 

To prove the theorem it is sufficient to check that there are no prohibited 
subwords in the word A = 11 . 1 22 „ . 211 „ . 1 . 

fc+i fc+i fc+i 

Obviously the left end of a possible prohibition can be only in the left block 
1_^_1: 

fc+i 



1...12...2 2_^_2 1_^_1 

k-i+l 2i+j-k-l 



with 



j + i > fc + 1 (1) 



Two cases are possible: 
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1. j > k - i + 1 

2. j < k - i + 1 

In the first case there is non-coincidence between the left and right parts 
of the prohibition in the first k — i + 1 letters and in the last i letters that is 
non-coincidence in k + 1 letters. So this case is impossible. 

In the second case we have non-coincidence in the first j letters and in the 
last 2i + j — k — 1 letters. Hence we have non-coincidence in 2(i + j) — k — 1 
letters, that according to (1) is greater than or equal to k + 1. 

It follows that the word l^^Q^^l^A does not contain a prohibition 

fc+i fe+i fc+i 

and thus the theorem is proved. □ 



Theorem 9 (Incompleteness). The set of prohibitions S3' for n > 3 is in- 
complete. 

Proof. Since the alphabet A is finite, there is no trivial solution of the problem 
(such as taking all letters of A and obtaining an infinite sequence with the 
properties needed). So to prove the incompleteness of the set Sg' fe we have to 
show the existence of an infinite word which is free from the set of prohibitions 

Sn,k 
3 ■ 

We consider the case n = 3 and the alphabet A = {1, 2, 3}, since the incom- 
pleteness of the set of prohibitions S3' for the case n > 3 will follow from the 
incompleteness of the set of prohibitions for the case n = 3. 

Let B = {a, b, c} be an alphabet. B* is the set of all words of the alphabet B. 

We define the mapping / as follows: 

1 . . . 1 -> a, 2_^_2 -> b, 3^_3 -> c. 

fc+i fc+i fe+i 

The domain of the mapping / is the set of words of the alphabet 

C = { 1_1, 2_^_2, 3_^_3 }. 

fc+i fc+i fc+i 

The image of the mapping / is the set B*. 

Let the set of prohibitions S' = {XX\X S B*}. Obviously, the set S' 
coincides with the set Sj whenever A = B. 

It is known ]jj that for the alphabet B there exists the infinite sequence 
L' which is free from the set of prohibitions S'. V is built by iteration of 
morphisms: 

a — * abc 
b —> ac 
c^b 



9 



The morphism iteration procedure is as follows. 

We start from the letter a. Then we substitute this letter with abc. Then 
we substitute each letter in abc by the rule above. We obtain after this step 
abcacb. And so on. Executing this procedure an infinite number of times gives 
us the sequence L'. 

Let us prove that the sequence L = does not contain words prohib- 

ited by Sf k . 

We are going to prove the statement by considering L and all possible dis- 
positions of words prohibited by S^' k . 

The sequence L is built up from the letters of the alphabet C or in other 
words from the blocks x^je, where x £ {1, 2, 3}. It means that there are only 

fc+i 

three different cases for a disposition of a possible prohibition in L. 



Case 1. x^^jl. . . y . . . y z_ 

fc+1 fc+l k+\ 

A 

Case 2. x...xx...x...y...yz...z...t...tt...t , where < i < k + 1; 




Case 3. x^jcx^^x. . .y . . .yy. . .y . . t^.At^.A , where < I < 

i k — i+1 

Now we will consider these cases and show that each of them is impossible. 

Case 1. Let P denote the prohibited subword (prohibition) under consid- 
eration, R and L denote the right and the left parts of P respectively. 

It is obvious that L and R have the same number of blocks. Moreover, the 
ith block of L (from the left to the right) is equal to the iih block of R, because 
otherwise we have non-coincidence of L and R in at least fc + 1 letters which 
contradicts the fact that P G S". So we have that P = WW for some W G C*. 

Now, /(P) = f(W)f(W) is a subword of V . But f(W)f(W) G S' which is 
impossible by the properties of L'. So Case 1 is impossible. 

We note that an important consequence of Case 1 is the following. If 
x^jEy ... y is a subword of L then x ^ y. 
fc+i fe+1 

Case 2. If there are no letters between x^jc and y . . .y, that is 

fc+i fe+1 



P = i^iy. . . y z^^zt^t^ , 

fc — i+l k-\-l k+l fc — i+1 

then we must have x = z, because otherwise we have x ^ z and y ^ z which 
gives us that L and R differ in the first fc + 1 positions, but this contradicts 
P G S^ fc . 
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By the same argument we have y = t, so 



p = 2^_3y • • • yx_^_xy- ■ ■ y 

k-i+l k+1 fe +! k-i+l 

But if we consider now f(L) = L' then it has 



P' = f(x_^x)f(y^y) f(x_^x)f{y^y) . 

fe+l k+1 k+1 k+1 

as a subword, which is impossible since P' € S'. 

So there is some non-empty subword in L between and y . . .y, and P 

fe+i k+1 

can be written as 



P = x^jcxi . . .x\ . . . x p . . . x p y . . . yz^j^zi . . . z\ . . . z p . . . z p t^J, ■ 
k-i+l fe+i " ' k+1 k+1 fe+i ^~k+T^ k-i+l 

There are two possible subcases here. 
1. X = z. 

Since x ^ X\ we have X\ ^ z. If x\ ^ z\ then L and R differ in k + 1 position 
starting from the (k — i + 2)th position, which is impossible since P G S3' . So 
x 1 = z 1 . 

In the same way, for each of X2, X3, . . . x p , y, we can obtain that 



P — Z. . . Z Z\ . . . Z\ . . . Zp . . . Zp t. . . t Z . . ■ Z Z\ . . . Z\ . . . Zp . . . Zp t...t 
k-i+l k+1 " "k+1 ' k+1 k+1 fe+l " ifc+1 ' k—i+1 

which leads us to the fact that L has a subword WW for some W G C*, 
hence L' has a subword f{W)f{W) which is impossible. 
So the subcase 1 is impossible. 
2. x/z. 

If x 1 7^ z then L and R differ in k+1 position starting from the first position, 
which is impossible since P G S3 . So x\ = z. 

If x 2 7^ z\ then L and R differ in k + 1 position starting from the (k + 2)th 
position, what is impossible by the same arguments as above. So x 2 = z\. And 
so on. 

We have 




P = x^^xz^^zzi . . . Z\ 

k—i+l fe+l k+1 

Applying / to L gives us a subword P' of L' , 



p ' = fjzjjj^fjzi zi) ■ ■ ■ J( z p ■■■ z p) / Q^-^)/ (51 -- z i) ■ ■ ■ f( z p ■■■Zp), 

fe+l fe+l k+1 fe+l k+1 k+1 
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which is prohibited in L' by S'. 

We have got that subcase 2 is impossible and hence Case 2 is impossible. 

Case 3. We can assume that i ^ and I ^ k + 1, because otherwise we 
deal with either Case 1 or Case 2 which are impossible. 

We suppose that i > i (the case i < £ can be considered in the same way). 

If there are no letters between y . . .y and t^J, , then we have either 

k-t+i k-j+i 

P = x . . . xy . . .yy . . .y t . . .t 

k-i+l £ k-e+i k-j+i 

or 



rt ' . > 

P = x „ . x z .^.z^y...yy...yt.^.t . 

k—i+l k+l n k—i+l k-j+i 

In the first of these cases we have that x ^ y and y ^ t which gives us 
that L and R have non-coincidence in at least k + l letters, but this contradicts 

P e Sf k . 

In the second case we must have z = t, because otherwise since z ^ y and 
t ^ y, L and R have non-coincidence in the last k + l letters which is impossible. 
So in the second case we have 

P = x^^t^^ty . . . yy. . .y t_^_t . 

k-i+l k+l i k—t+l k-j+1 

If x 7^ y then L and R have non-coincidence in the first k — i+l positions 
and in the last I positions, that is they have non-coincidence in at least k + l 
positions which is impossible. So x = y. 

Now applying / to L gives us that L' has a subword 



P' = f{x_^c)f{t^t) f(x_^)f{t_^t) 

k+l k+l k+l k+l 

which is impossible. 

So there is some non-empty subword in R between y . . .y and t^^t, and P 

k-e+i k-j+i 

can be written in the form 



P = ...L p y...yy...yR\...Rpi t^J, , 

k-i+l i k-t+1 k-j+l 

where L s , R m e C, for 1 < s < p, 1 < m < p', and cither p = p' or p = p' + 1. 
We define A(L S ) = x s if L s = x s . . . x s . In the same way we define A(R m ). 

k+l 

Now we have that either p = p' or p = p' + 1. Each of these cases has two 
possible subcases: either x = y or x ^ y. Let us consider the case p = p' + 1. 
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The other case can be considered by similar reasoning. Thus we must consider 
the following subcases a) and b): 

a) x = y; It must be that L\ = Ri, because otherwise L and R differ in 
k + 1 positions starting from the (k — i + 2)th position. Then we consider one 
by one L2, L3, ... ,L p . One can see that in this subcase 



P = y . . . y Ri . . . Rp! ^Ay . . . y y . . . y Ri . . . Rp> t-A , 

fc-i+l fc +! I k-l+1 k—j+1 

and L has WW as a subword, where W = y . . . y R\ . . . R p ' {^^J, which is im- 

fc+i k + 1 

possible. 

b) x 7^ y; There are two special subcases here, namely either A(Li) = y 
or Li = R\ . 

If A(Li) = y then 



P = x_^_xy...yR 1 . . . R p , y . . .yy . . .y R 1 . . . R p > t^A , 

k-i+i k+i e k-e+i k-j+i 

and L has WW as a subword, where W = y . . .y R\ . . . R p i which is impossible. 

k+l 

So Li = Rx. In this case we have 



P = x^jc R 1 . . . R p > t_^J, y ■ ■ ■ y y ■ ■ ■ y Ri ■ ■ ■ Rp' ■ 

fc-i+i k+i 1 k-e+i fc-i+i 

Since y ^ x, y ^ A(i?i) and y ^ t, L and R have non-coincidence in the 
first k — I + 1 positions and in the last I positions, so they have non-coincidcncc 
in k + 1 positions which contradicts P € Sf' k . 

We have got that Case 3 is impossible. 

We have proved that the infinite word L contains no word from the set S^ :k 
as a subword, therefore S^' k is incomplete for n > 3. □ 



5 The Complexity of Problems on Complete- 
ness of Sets of Words 

It is known ||, |i| that the complexity of deciding whether or not an arbitrary 
set of prohibited words S is complete (or blocking) is 0(|S| • n), where n is the 
greatest length of a word in S. 

It is interesting in its own right to be able to effectively (in polynomial 
time) recognise whether a set is complete, but also to give a more detailed 
characterisation of the set of words S, in particular to find the greatest length 
of a word that is free from S. The set A™ is the set of all the words in the 
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alphabet A whose length is equal to n. If S C A™ and L(n) = maxL(S), where 
L(S) is the greatest length of a word that is free from S, then || we have 

L(n) = |A|' i_1 +ti-2 = C[n) +n - 1. 

Here C(n) is the greatest length of a single path in the de Bruijn graph of 
order n that has no chords and does not go through the vertices with loops 
corresponding to the constant words (x, . . . ,x) where x G A. 

One can find all words that are free from S, in particular all crucial words, 
simply by considering all words of length less than or equal to L(S) and checking 
for each word, if it is free from S. Such an algorithm is not effective since it can 
require considering |A| L (' 1 ' words. 

The question of deciding the possible lengths of words that are free from S, 
in particular of crucial words, can be formulated as a problem of recognising 
properties of "languages of prohibitions" in the terminology of the theory of 
NP-completeness ||. 

Problem A: 

Given: An arbitrary set of words S and a natural number I. 

The question: Does there exist a word of length at least I that is free from 

S? 

In order to compare, we formulate the problem of completeness of a set of 
words S in the same form. 
Problem B: 

Given: An arbitrary set of words S in an alphabet A. 

The question: Does there exist t £ N such that \X\ < t for any word X that 
is free from S? 

Considering problems A and B as problems of recognising properties of finite 
sets S, we observe that problem B is a question of existence of a bound on the 
length of the words that are free from S. This problem, as we have already 
mentioned, can be solved effectively with complexity of order |S| • n. In the 
same time the problem A is a question of determining of this bound. We will 
show that problem A, as opposed to problem B, is NP-complete. 

The research on the problems of completeness of sets of words and languages 
of prohibited subwords was begun by different authors 0, |, [§ § |, § in the 
1970s. The interest in the general question in this area arose from considerations 
of different types of special problems, in particular, in coding theory, combina- 
torics of symbolic sequences, number theory and problems of Ramsey type (for 
instance the arithmetic progressions in partitions of the natural row). For al- 
gebraic problems it is more typical to study avoidance of infinite sets S that 
arc defined by prohibitions of words (called terms) in an alphabet of variables 
that can themselves be words jj], 0], Different problems on sequences without 
repetitions, under variation the concept of "strong" or "weak" repetition of sub- 
words, are the typical examples of problems of this class. Finally we observe 
that problems A and B for infinite sets S do not make sense if one does not 
consider particular constructive methods for generating a set S. 
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Let A = {ai, . . . , a„} be an alphabet and Ai be the set of all those words 
on the alphabet A whose length is less than or equal to I. We assume also 
that the empty word belongs to A^ and that Si is an arbitrary set such that 
Si C A 2 \ Ai. We define S 2 by 

S 2 = {xXx\ x <= A,X S A"" 1 }. 

So the set S 2 contains all possible words of length less than or equal to n + 1 
whose first letter coincides with their last letter. Suppose S = Si U S 2 . 
We now consider an "auxiliary" problem A'. 
Problem A': 

Given: A set S of the type described above and a natural number £, £ < n. 
The question: Does there exist a word of length at least £ that is free from 

S? 

In case of the problem A', the restriction on £ is natural, because any word 
free from S is free from S 2 and therefore consists of different letters of the 
alphabet, whence its length is less than or equal to n. 

Checking whether a given word of length £ (a solution of A' that we "guessed" ) 
is free from S can be done in polynomial time. Indeed, the freeness from S 2 of 
the word is equivalent to the absence of identical letters in the word (which can 
be checked in linear time) and the freeness from Si is recognised by considering 
all subwords of length 2 (there are I — 1 such subwords) and by checking for 
each of them whether it belongs to Si (polynomial checking time). 

We now introduce the problem of "the longest path in a graph" , which is 
known to be NP-complete (see |6|). 

Problem "path": 

Given: A directed graph G(V,E) and a natural number £, £ < \V\ = n. 

The question: Does there exist a simple directed path (without self-intersections 
in vertices) of length at least £1 

One can obtain a correspondence between problem A' and problem "path" 
as follows. We compare vertices V\,... ,v n from V(G) to the letters a±, . . . ,a n 
in the alphabet A. Also we compare each edge VjVj from E(G) to the word aiaj. 
We form the set Si from all such words of A 2 that correspond to the edges of 
the graph that is the complement of G with respect to the complete directed 
graph. 

Now to any oriented simple path Vi 1 , . . . , m e of length £ in G there corre- 
sponds the word a,i x . ■ ■ di e of length £, consecutive letters of which correspond 
to vertices in the order in which the path passed through them. This word is 
free from Si because ai j ai j+1 $ Si for any i = 1, 2, . . . , £ — 1. The word is free 
from the set S 2 as well because in the path there is no repetition of vertices (a 
property of a simple path) and therefore . . . a ie does not contain a subword 
of the form a,Aoi for any word X and any letter £ A. 

Conversely, to any word in the alphabet A that is free from S there corre- 
sponds a path in G(V, E) that goes through edges from E(G) since the word is 
free from Si and that is not self- intersected since the word is free from S 2 . 

Now NP-completeness of problem A' and the more general problem A follows 
from NP-completeness of the problem "path" . 
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